Bioinformatics (Thomas Dandekar, Meik Kunz)

Table 8.1 Degree of difficulty of a P problem (sequence alignment) compared to an NP problem

(protein folding)

Algorithm

Runtime complexity (m, n = sequence length of a, b)

Heuristic algorithms:

Blast

O(n*m)

Dynamic algorithms:

Needleman-Wunsch

Cubic: O(n³); e.g. at 5 = 125

Smith-Watermann

Quadratic: O(n²); e.g. with 5 = 25

Protein folding:

For x possible folds

Exponential: xn (e.g. for 2 convolutions: 2n; for 7

convolutions: 7n)

8.1

8.3

Informatic Solutions for Computationally Intensive

Bioinformatics Problems

Many interesting problems in biology and bioinformatics have a built-in combinatorics

and thus a very large, difficult to understand solution space, which therefore has the diffi

culty NP (solution very difficult to find and computation time not foreseeable - if you show

me the solution, I can usually confirm it relatively quickly). All in all, however, computa

tional time problems are computer science problems, which can therefore also be tackled

directly with tools from computer science and computer technology.

Tip 1: Use Modern Computer

This is often effective in practice. First, if you have a difficult or computationally intensive

bioinformatics problem, you should not use a web server (otherwise you might wait until

you black out!). However, most bioinformaticians have already taken this into account

when designing their programs. Protein structure predictions, for example, are often not

done online on the web server, but one receives (after a few hours or even days) the result

by e-mail (for example, when using SWISS-MODEL for homology models or ab-initio

predictions by the QUARK software from the Zhang lab). For own calculations I should

first use a notebook or PC as up-to-date as possible. Workstations or small computer clus

ters have even more computing power at first. For larger calculations, local (university

mainframes) or central computer clusters (e.g. Leibniz Computing Centers in Munich,

etc.) are then available. Tier 1 or Tier 0 mainframes such as JUQUEEN in Jülich then

provide the greatest performance (6 million billion floating point operations per second)

with 5.9 petaflops per second (https://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercom

puters/JUQUEEN/JUQUEEN_node.html).

8.3 Informatic Solutions for Computationally Intensive Bioinformatics Problems